Real-time Trajectory Planning and Tracking Control of Bionic Underwater Robot in Dynamic Environment

In this article, we study the trajectory planning and tracking control of a bionic underwater robot under multiple dynamic obstacles. We first introduce the design of the bionic leopard cabinet underwater robot developed in our lab. Then, we model the trajectory planning problem of the bionic underwater robot by combining its dynamics and physical constraints. Furthermore, we conduct global trajectory planning for bionic underwater robots based on the temporal-spatial Bezier curves. In addition, based on the improved proximal policy optimization, local dynamic obstacle avoidance trajectory replanning is carried out. In addition, we design the fuzzy proportional-integral-derivative controller for tracking control of the planned trajectory. Finally, the effectiveness of the real-time trajectory planning and tracking control method is verified by comparative simulation in dynamic environment and semiphysical simulation of UWSim. Among them, the real-time trajectory planning method has advantages in trajectory length, trajectory smoothness, and planning time. The error of trajectory tracking control method is controlled around 0.2 m.


Introduction
Underwater robots have the capability to perform a wide range of tasks, eliminating the need for human intervention.These tasks include underwater biological observation, pipeline inspection, resource exploration, and underwater reconnaissance [1][2][3][4].In nature, fish possess exceptional underwater locomotion abilities, allowing them to swim swiftly and efficiently [5].By emulating the movement patterns of fish, bionic underwater robots exhibit superior mobility, efficiency, and speed compared to traditional propeller-driven underwater robots.These bionic robots also demonstrate improved adaptability to the complex underwater environment [6].
In recent years, substantial advancements have been made in the development of fish-like underwater robots.Simons et al. [7] introduced Galatea, a robotic fish propelled by both pectoral and caudal fins.This design allows for control over depth, movement patterns, and underwater orientation.Zhang et al. [8] presented a dual-caudal-fin robotic fish, which exhibits exceptional stability and speed, capable of reaching a maximum speed of 1.21 BL/s.Wang et al. [9][10][11] developed Robcutt, a third-generation robotic fish specifically designed for deep-sea inspection and other underwater tasks.
The autonomous avoidance of underwater obstacles and efficient trajectory planning are crucial for bionic underwater robots to successfully complete their underwater tasks.Extensive research has been conducted in this area by scholars.Sani et al. [12] developed a trajectory controller based on the proportionalintegral-derivative (PID) method, which utilizes autonomous underwater vehicle (AUV) position, velocity, and acceleration data to plan trajectories.This method demonstrates high accuracy and robustness.Filaretov et al. [13] employed Bessel curves to generate smooth trajectories for AUVs in obstaclefree environments.Wang et al. [14] designed a real-time dynamic Dubins-Helix method for trajectory planning in obstaclefree environments, resulting in well-smoothed trajectories.Murthy et al. [15] proposed a spline curve-based trajectory planning method to address the terrain following problem for AUVs in obstacle-free environments.Zeng et al. [16] employed B-spline curves to rapidly plan optimal trajectories for underwater robots in static obstacle environments.Li et al. [17] combined cubic B-spline curves with genetic algorithms to design a trajectory planning method for underdriven underwater robots, effectively matching the kinematic capabilities of AUVs.Wang et al. [18] achieved collaborative optimal trajectory planning for multiple robots operating in static obstacle environments by utilizing temporal-spatial Bezier curves.Gan et al. [19] introduced a 2D trajectory planning model for underwater environments that utilizes the repulsive force field method.This approach demonstrates effective trajectory planning results, particularly in scenarios with a mix of dynamic and static obstacles.Reinforcement learning algorithms have enabled underwater robots to interact directly with the environment, offering improved planning effectiveness and adaptability in complex and ever-changing underwater conditions.Wang et al. [20] developed an online learning algorithm based on reinforcement learning, which plans optimal trajectories for AUVs in obstacle-free under-ice environments.Yang et al. [21] devised a value-based reinforcement learning algorithm that allows planned trajectories to adapt to varying oceanic conditions.Hadi et al. [22] designed a trajectory planning method based on deep reinforcement learning algorithms, enhancing the robustness and adaptability of trajectory planning in dynamic obstacle environments.
While the aforementioned trajectory planning methods have yielded positive results, they face challenges in complex and dynamic underwater environments where obstacles are prone to movement.This dynamic nature of underwater obstacles poses greater difficulty for underwater obstacle avoidance.Furthermore, there is a scarcity of studies focusing on trajectory planning for underwater environments with dynamic obstacles.Although reinforcement learning methods exhibit superior adaptability to the environment and perform well in multidynamic obstacle underwater scenarios, there is still room for improvement in terms of planning efficiency, trajectory length, and smoothness.
During the operation of bionic underwater robots, ensuring accurate trajectory tracking and reaching the desired destination is equally crucial, making motion control research of paramount importance.Wang et al. [23] successfully achieved path-point tracking control on a homemade cuttlefish-imitating underwater robot called Robcutt-II by combining a low-complexity guidance system with the backstepping method.Their approach demonstrated excellent accuracy and efficiency.Oh et al. [24] tackled the path-point tracking control problem by combining line-of-sight navigation principles with the model predictive control method.Wang et al. [25] designed a path tracking controller based on active disturbance rejection control, which exhibited higher accuracy and stability compared to traditional PID methods.Chen et al. [26] developed a path tracking controller utilizing a fuzzy control algorithm optimized by a genetic algorithm, resulting in improved accuracy and robustness compared to conventional fuzzy controllers.Zhang et al. [27] effectively integrated the deep deterministic policy gradient algorithm with an adaptive multiple constraint controller for both path tracking control and obstacle avoidance.Their approach demonstrated commendable algorithmic efficiency and stability.Wang et al. [28] designed a spline-bridged elite-duplication genetic algorithm-deep reinforcement learning framework by combining elite-duplication genetic algorithm with deep reinforcement learning.The integration of path planning, path smoothing, and path tracking control for unmanned surface vehicle (USV) in congested waters was realized.Wang et al. [29] designed a data-driven performance-prescribed reinforcement learning control scheme to achieve optimal control without the need of USV modeling.It has good trajectory tracking control accuracy and stability in a complex environment with multiple obstacles.Cui et al. [30] developed a trajectory tracking controller based on deep reinforcement learning algorithms, yielding favorable outcomes in terms of control accuracy and stability.
Wang et al. [31] established a reinforcement learning-based optimal tracking control scheme.It has good accuracy and stability for the trajectory tracking control problem of USV in the presence of complex unknowns.Trajectory tracking enables an underwater robot to follow a desired path according to a specified time pattern.It is very important to enhance the motion accuracy of the robot and improve the efficiency of underwater work.However, the majority of existing studies on tracking control for bionic underwater robots primarily focus on path point tracking control and path tracking control.Trajectory tracking, which involves temporal-spatial synchronization, is a more complex task, leading to a limited number of studies in this particular area.
Therefore, this article focuses on the above issues for research.Efforts have been made to solve the trajectory planning problem in underwater multidynamic obstacle environments, and a trajectory planning method combining temporal-spatial Bezier curve and improved proximal policy optimization (IPPO) algorithm has been proposed.First, we plan a smooth global trajectory using temporal-spatial Bezier curves.Then, we use the IPPO algorithm to perform real-time local trajectory replanning during the bionic underwater robot's traveling along the global trajectory.Dynamic obstacles on the global trajectory are avoided, while the bionic underwater robot quickly returns to the global trajectory.The performance of trajectory planning is improved while solving the trajectory planning problem of the bionic underwater robot in a multidynamic obstacle environment.Finally, a fuzzy PID controller is designed to control the bionic underwater robot to track the planned trajectory, and the performance of this article's method is demonstrated through simulation and comparative experiments.
In the remainder of this paper, Materials and Methods describes the bionic underwater robot trajectory planning problem description and details the trajectory planning scheme.Results describes the simulation experiment design and experimental results.Finally, Discussion gives conclusions and outlook.

RoboDact system design
This research article utilizes the bionic underwater robot, RoboDact, as an experimental platform to validate the proposed trajectory planning method.As shown in Fig. 1, RoboDact has a pair of wavy pectoral fins and a double-jointed caudal fin.It moves using a hybrid propulsion mode and can swim fast in broadly categorized body/caudal fin mode while having good stability in median/paired fin mode.Three identical servo motors are mounted on each side of the main control cabin to drive the undulatory pectoral fins for stabilized movement.A 200-W DC motor is mounted at the rear of the main control cabin to drive the lumbar joint, while a 90-W DC motor and driver are mounted in the caudal cabin to jointly drive the tail fin to oscillate for fast swimming.In addition, the DC motor driver, power management unit and servo motor driver are all installed in the main control cabin.Table 1 lists the relevant parameters of RoboDact [32].

RoboDact dynamics model
The trajectory planning problem addressed in this article pertains to 3-dimensional (3D) space.However, considering the characteristics of RoboDact, which includes a large metacentric height and excellent static stability, this study disregards its motionin the pitch and roll degrees of freedom.Moreover, given the relatively low-speed motion of bionic underwater robots, this research simplifies the analysis by assuming linear damping and disregarding nonlinear damping effects [33].As a result, the kinematic model of RoboDact can be represented as follows: where ̇ = x, y, z, T is the position and heading of the bionic underwater robot in the world coordinate system.ν = [u, v, w, r] T is the velocity of the bionic underwater robot in the follower coordinate system.J is the rotational transformation matrix, which can be expressed as: The dynamics model of RoboDact can be represented as follows: where M is the mass and additional mass matrix.D is the linear damping matrix.C is the Coriolis force and centripetal force matrix.τ = [τ u , 0, τ w , τ r ] describes body-fixed thrust and torque (on surge, heave, and yaw).τ d = [τ du , τ dv , τ dw , τ dr ] represents the disturbance force and torque (on surge, sway, heave, and yaw).Since RoboDact has a bilateral symmetric structure, the matrices M and D can be expressed as: Meanwhile C can be expressed as: The actual amount of control over the bionic underwater robot is actually the parameters of the waves in its pair of pectoral and one caudal fins, including frequency f, amplitude A, and deflection angle θ.The following equation was used to fit the thrust or torque to the above control parameters on the actual RoboDact machine: where , and k r2 are some characteristic parameters.Equation 6is derived from a series of force measurements and motion experiments [34,35].

Trajectory planning problem description
The trajectory planning problem can be defined as follows: Given the physical limitations and global obstacles present in the environment, the objective is to devise a feasible trajectory that allows the bionic underwater robot to navigate from its initial position to the desired end position.Additionally, in the event of encountering local dynamic obstacles during the robot's traversal, the proposed approach enables real-time trajectory replanning to ensure obstacle avoidance and uninterrupted progress.As shown in Fig. 2, the initial position of RoboDact is (x s , y s , z s ) in the world coordinate system O E X E Y E Z E .The target position is (x g , y g , z g ) in the world coordinate system.O B X B Y B Z B is the follower coordinate system.C is the planned feasible trajectory.s i and d i is multiple static obstacles and dynamic obstacles in the environment, respectively.Then, the trajectory planning problem can be expressed as: x s ,y s ,z s where F s is the initial state of the bionic underwater robot.F g is the target state of the bionic underwater robot.(vx s , vy s , vz s ) is the initial velocity of the bionic underwater robot.(vx g , vy g , vz g ) is the target velocity of the bionic underwater robot.

Real-time dynamic trajectory planning
This section proposes a solution for the trajectory planning problem of the bionic underwater robot RoboDact.As shown in Fig. 3, the trajectory planning scheme mainly consists of a global trajectory planning part, a local trajectory planning part, and a motion controller.The first part plans a smooth global trajectory from the start point to the end point for RoboDact.The initial positions (x si , y si , z si ) of all dynamic obstacles i are input with the target state F g , and the Bezier curve gives a smooth global trajectory C g based on the input.The second part is the local trajectory planning part.All obstacles in the environment move dynamically, and when RoboDact moves along the global trajectory C g , there may be obstacles moving to the global trajectory.This requires an improved PPO algorithm for real-time local trajectory replanning.The start position is set to the current position of the bionic underwater robot (x st , y st , z st ), and the target position is set to the appropriate position on the global trajectory C g that crosses the obstacles (x gt , y gt , z gt ).The real-time states of all obstacles (x ti , y ti , z ti , v ti ) are input into the trained model to give a local trajectory C l that brings the RoboDact back to the global trajectory C g .The third part is the motion controller, which is used to control the motion of the bionic underwater robot to follow the planned trajectory C to the target position.The actual control inputs to the RoboDact are the parameters of the 3 fin-on-fin waves that include frequency f, amplitude A and offset angle θ.

Global trajectory planning section
In this section, we use temporal-spatial Bezier curves to provide global trajectory planning for RoboDact.We use cubic Bezier curves, which have 4 control points P 0 , P 1 , P 2 , and P 3 , and the cubic Bezier curves can be expressed as: where B i,3 (τ) is called the Bernstein basis function.P i are the coordinates of the ith control point.
The cubic Bezier curves has 2 properties: • Endpoint property: For cubic Bezier curves, there are at τ = 0 and τ = 3, respectively: • Cut vector property: Derivation of the above equation gives the direction of the tangent lines to the 2 end points at the start and end of the Bezier curve respectively: From the Bezier curve mathematical expression, when we introduce the time axis in the Bezier curve, we can correspond the cubic Bezier curve to the 4-dimensional space-time, which can be expressed as: where (x i , y i , z i , t i ) is the spatiotemporal position of the control point P i .By the 2 properties of the Bezier curve, only the temporal-spatial positions of points P 0 and P 3 need to be known to determine the Bezier curve.At τ = 0 and τ = 1, it can be obtained: That is to say, it is only necessary to determine the moments t i of the 4 control points to determine the spatial positions of P 1 and P 2 .In addition, t 0 and t 3 are known, so the global trajectory planning problem for Bezier curves can be transformed into an optimization problem for 2 time parameters t 1 and t 2 .We use the particle swarm optimization algorithm to solve the optimization problem.The optimization objective is the shortest total distance S of the trajectory, and S can be expressed by accumulating each small distance: Considering the various constraints of the trajectory planning problem, the objective function expression is defined as follows: where the parameter M k is a very large number.δ k is the various constraints of the trajectory planning problem, including velocity constraints, acceleration constraints, curvature constraints, obstacle avoidance constraints, and so on.
Thus, the optimization problem can be expressed as: The initial number of particles is set to M. The dimension is 2 corresponding to the 2 time parameters t 1 and t 2 , and the maximum number of iterations is T. According to the expression: where v j,k , x j,k are the position and velocity of the jth particle at the kth iteration, respectively.p j,k is the optimal solution position of the jth particle in the previous k iterations.g k is the optimal solution position of the whole particle swarm in the previous k iterations.c 0 , c 1, and c 2 are the population cognition coefficients.After T iterations, the shortest distance can be finally obtained.

Local trajectory planning section
In this section, the improved PPO algorithm is used to provide local trajectory planning for RoboDact.As shown in Fig. 4, when the bionic underwater robot advances along the global trajectory and detects that there is a dynamic obstacle moving to the global trajectory within a safe distance d s in front of it, it needs to carry out real-time local trajectory replanning through the improved PPO algorithm to get back to the global trajectory bypassing the obstacle.The obstacles in the environment are spherical obstacles with a radius r from the center of the sphere as their surface location.The starting point of local trajectory planning is set as the current position (x st , y st , z st ) of the bionic underwater robot.Since the target position is set too close to the starting point may affect the effectiveness of local trajectory planning, the point at 2d s from the obstacle surface of the global trajectory is taken as the target position (x gt , y gt , z gt ).The trained strategy model is utilized to make the bionic underwater robot quickly return to the global trajectory around the obstacle.Note that when the bionic underwater robot is about to reach the target position and detects an obstacle moving to the target position.The target position is updated to a point 2d s from the surface of the new obstacle and local trajectory planning continues.During the local trajectory planning process, the bionic underwater robot acts directly with the environment.The state s t+1 of the bionic underwater robot at the next moment depends only on the state s t at the previous moment.
Design the IPPO algorithm state as: where d i,obs , v i,obs are the distance and speed of the bionic underwater robot from all obstacles within the current safe distance, respectively.d goal is the distance of the bionic underwater robot from the target location.
Action a is designed as the position information of the next moment of the bionic underwater robot, denoted as: where ρ,σ,θ is the angle from the current position of the bionic underwater robot to the next position in the world coordinate system with the 3D coordinate axis, according to which the next moment state s of the bionic underwater robot can be updated.
Since the real-time local trajectory planning process, the bionic underwater robot needs to avoid all dynamic obstacles at each moment while quickly reaching the target position and returning to the global trajectory.Therefore, the reward function needs to take both tasks into account and is designed as: where dis obs is the distance between the bionic underwater robot and the center of the nearest obstacle.r is the radius of the obstacle.r+0.9 is the safe distance between the bionic underwater robot and the surface of the obstacle.The first 2 items of the reward function cause the bionic underwater robot to avoid obstacles by applying varying degrees of punishment and to maintain a safe distance from the obstacles as much as possible.dis goal is the distance between the bionic underwater robot and the end point.dis sgoal is the distance between start and the end point.The latter 2 items make the bionic underwater robot converge to the end point through reward and punishment.The reward and penalty coefficients in the reward function are obtained after several adjustments by observing the training effect.
IPPO algorithm is a reinforcement learning algorithm based on Actor-Critic network with good stability and convergence speed, where the Actor network input dimension is the state dimension state_dim.The output dimension is action_dim, denoting the average Gaussian decision value for each action.The Critic network input dimension is the state dimension and the output dimension is the reward value V of the evaluated state, denoted as: IPPO algorithm is a strategy gradient improvement algorithm; the core operation is the loss function.The IPPO algorithm improves the loss function to improve the strategy gradient sensitive to the learning rate.The loss function can be expressed as: where π θ (a t |s t ) and π θold (a t |s t ) are the new and old strategies, respectively.Ât is the dominance function.
When the new update parameters differ too much or too little from the old network, the policy network will be overupdated and fall into a local optimum, which is not conducive to convergence.Therefore, IPPO introduces the CLIP function for limitation, and the loss function is finally expressed as: where ϵ is the clip parameter.The dominance function Ât can be expressed as: where V π (s t ) is the expected value of rewards for all actions in state s t , which is the weighted average value.Q π (s t , a t ) is the reward value obtained by performing action a t in state s t .Therefore, the model can be trained using this loss function.

Motion controller
In this section, a fuzzy PID controller is used as the motion controller of RoboDact.After the trajectory planning is completed, RoboDact receives the control signal to track the planned trajectory through the motion controller.The fuzzy PID controller is based on the PID algorithm.The parameters of the PID controller K p , K i , K d are adjusted in real time by the fuzzy control method, so that the motion controller has a certain degree of adaptive ability.The controller can be expressed as: where e(k) is the input error.ec(k) is the derivative of the input error.x c is the current position of the machine fish, and x d is the desired position.
The PID controller law can be expressed by the following equation: where K p , K i , K d are the proportional, derivative and integral gains of the PID controller respectively.All the 3 parameters are calculated by the fuzzy controller which uses Mamdani type for adaptive optimization of the gains.
The fuzzy controller is generated with reference to the following 3 key principles.The generation principles refer to [36]: • The proportional gain K p has the effect of reducing the rise time and steady-state error.Therefore, when the input error e is large, the K p value should be increased appropriately to improve the response speed of the machine fish.When e is medium in size, take a smaller value of K p so that the system has a smaller amount of overshoot to ensure the response speed.When e is small, the K p value is adjusted to a larger value to reduce the static difference and improve the control accuracy.Therefore, the fuzzy rule control table for K p is shown in the Table 2.
dis obs − r − 0.9 r + 0.9 − 0.5 if r < dis obs < r + 0.9 • The integral gain K i has the effect of eliminating static errors.When the input error e is large, K i should be small to prevent integral saturation.When e is medium in size, K i should be relatively moderate to avoid affecting stability.When e is small, K i should be increased to reduce the regulation static difference.Therefore, the fuzzy rule control table for K i is shown in the Table 3. • The derivative gain K d has the effect of smoothing the dynamics of the system.When the input error e is large, K d should be increased to obtain a smaller overshoot.When e is medium in size, K d should be appropriately small and kept constant.When e is small, K d should be small to minimize the controlled process.Therefore, the fuzzy rule control table for K d is shown in the Table 4.

Results
This section begins by verifying the effectiveness of the trajectory planning method proposed in this research paper within complex environments containing 5 dynamic obstacles.Through comparative experiments with traditional methods such as A* and rapidly exploring random tree (RRT) algorithms, we demonstrate the superiority of our approach in addressing trajectory planning problems.Additionally, we conduct semiphysical simulation experiments using the UWSim platform.
Prior to these experiments, it is essential to train the local trajectory planning IPPO algorithm.During training, various parameters such as the starting point, end point, and dynamic obstacles within the environment are randomly simulated.Subsequently, the IPPO algorithm utilizes the data obtained from the interaction between the bionic underwater robot and the simulated environment to iteratively update the weights of the neural network until convergence is achieved, resulting in the acquisition of a trained strategy model.

Trajectory planning simulation and comparative analysis
In this section, we first validate the planning effect of the trajectory planning method in a complex environment with 5 dynamic obstacles.The initial and end positions of the bionic underwater vehicle (BUV) are (0,2,5) and (10,10,5.5) in meters.
Five dynamic obstacles with known initial positions are also set up, which are all moving with a certain function.The information of the given obstacles is as follows, and the length unit is meter in Tables 5 and 6.
We get the results shown in Fig. 5 by numerical simulation in MATLAB.As can be seen from the figure, the motion process of the bionic underwater robot is divided into 3 stages: Phase 1: From Fig. 5A to Fig. 5B, the BUV performs global trajectory planning based on the initial positions of all obstacles and moves along the Bezier curve.By the moment of Fig. 5B, the BUV detects the presence of obstacles within a safe distance on the global trajectory, so it starts local trajectory planning to avoid obstacles.
Phase 2: From Fig. 5B to Fig. 5G, the BUV performs local trajectory planning, which determines the starting point and target position of the local trajectory planning.Bypassing the obstacles, it reaches the target position at the moment of Fig. 5G and returns to the Bezier curve.Phase 3: From Fig. 5G to Fig. 5I, the BUV continues along the global trajectory.It reaches the end point at the moment of Fig. 5I.
From Fig. 5, it can be seen that the method in this paper can well plan a trajectory that bypasses all the obstacles in a complex environment with 5 dynamic obstacles.Also, from Fig. 5F, the BUV can quickly move forward to the target location and return to the global trajectory after bypassing the obstacles by local trajectory planning.
The comparison results are shown in Fig. 7. Table 7 gives the comparison results of the 4 algorithms in terms of planning time and distance traveled, where distance is in meters and time is in seconds.
Our method is better than the A* algorithm in planning time, planning path length, and trajectory smoothness.In addition, our method is significantly better in trajectory length and smoothness, although it is not as good as RRT algorithm in planning time.Due to its algorithmic principle, the RRT algorithm produces different trajectories each time, making it difficult to obtain good trajectory curves.The better result of RRT in multiple simulations is given in Fig. 7. Compared with the PPO algorithm, the method in this paper utilizes Bezier curves in conjunction with the PPO algorithm, which is significantly better in terms of trajectory length and trajectory smoothness, although it has a little more planning time.Therefore, the method in this paper has certain advantages in the multidynamic obstacle trajectory planning problem.

UWSim semiphysical simulation
This section presents the simulation experimental results of the proposed method using the UWSim platform.UWSim is a simulation platform for underwater robots that is based on ROS.It facilitates the loading of underwater robot models and provides an environment for testing integrated perception and control algorithms on robots [37].As shown in Fig. 8, the UWSim simulation platform mainly includes interaction modules, dynamics modules, core modules, underwater scene modules, etc.The interaction module enables communication with the external environment, while the dynamics module ensures the accurate representation of underwater robot dynamics.The core module is responsible for loading the robot URDF model and the main scene model.Lastly, the underwater scene module is employed to visualize underwater scenes and obstacles.
We converted the RoboDact robot fish model into a URDF file using SolidWorks and successfully imported it into UWSim.Additionally, we designed a 3D fish-shaped dynamic obstacle using 3D MAX and imported its corresponding osg file into UWSim.The simulated underwater environment provided by UWSim was utilized for our experiments.Both the BUV model and the fish-shaped dynamic obstacle were loaded into the simulation environment, allowing us to conduct our tests effectively.
The start position and end position of the BUV are (0,2,5) and (10,10,5.5) in meters.We set up 2 fish-shaped dynamic obstacles between the start and the end points, which move in a circular trajectory and a linear trajectory, respectively.After planning a global trajectory for the BUV through the Bezier curve global trajectory planning method, the BUV is controlled by a fuzzy PID controller to track the trajectory for motion in the simulation environment.When an obstacle fish is encountered, real-time local trajectory planning is performed by the IPPO algorithm to avoid the obstacle and return to the global trajectory.The simulation results are shown in Fig. 9.The trajectories of BUV and obstacle fishes are shown in Fig. 10.
As can be seen from the figure, at 0 s, the BUV is located at the starting position and starts to advance along the Bezier curve global trajectory.At 14.4 s, a fish-shaped obstacle is detected to prepare for real-time local trajectory planning.various moments on the body-fixed thrust and torque (on surge, heave, and yaw), the actual trajectory of the BUV compared with the desired trajectory, the amount of tracking error variation, and the distance between BUV and obstacles.Notice that the control inputs are propulsive force and torque in the simulation.While the actual control inputs of the RoboDact is the paremeters of the waves in its pair of pectoral and one caudal fins, and the relationship between them is given by Eq. 6.In Fig. 13, we set the BUV to maintain a distance of 0.2 m from the current trajectory target point, which is much smaller than the size of the RoboDact and therefore acceptable.
In the case of external interference, the BUV maintains the distance to the target point well with a small error.Therefore, it can be seen that the controller has good accuracy and antiinterference ability.The red dashed line in Fig. 13 shows the safety distance of the BUV, which is 0.9 m.In Fig. 13, d 1 , d 2 are the distances of the BUV from barrier fish 1 and barrier fish 2, respectively.The BUV keep a safe distance from the 2 barrier fishes during trajectory tracking.Thus, it can be concluded that the fuzzy PID controller in this paper can effectively control the BUV to track the desired trajectory with good accuracy and stability.

Discussion
In this research paper, we propose a trajectory planning method that combines Bezier curves and the IPPO algorithm.The Bezier curve is initially employed to plan a global trajectory, while the IPPO algorithm is responsible for local trajectory planning when the BUV encounters obstacles during its movement.This enables the BUV to bypass obstacles and quickly return to the global trajectory.Through experiments conducted in a multidynamic obstacle environment and comparative analyses with the A* and RRT algorithms, we demonstrate the effectiveness of our proposed method and highlight its advantages in terms of trajectory length, planning time, and trajectory smoothness.Furthermore, we validate our approach through semiphysical simulation experiments on the RoboDact platform within the UWSim simulation environment, achieving satisfactory results.
In the future, we intend to conduct bionic underwater robot physics experiments to implement the trajectory planning and tracking control methods proposed in this paper on the RoboDact prototype.Furthermore, we plan to delve deeper into the research on the trajectory tracking control algorithm for bionic underwater robots, with the aim of enhancing the accuracy of trajectory tracking control.These efforts will enable us to further refine and improve the application of our methods in real-world scenarios.

Fig. 3 .
Fig. 3.The block diagram of the trajectory planning system.
by numerical simulation in MATLAB.As can be seen from the figure, the motion process of the bionic underwater robot is divided into 4 stages: Phase 1: From Fig. 6A to Fig. 6B, the BUV performs global trajectory planning based on the initial positions of all obstacles and moves along the Bezier curve.By the moment of Fig. 6B, the BUV detects the presence of obstacles within a safe distance on the global trajectory, so it starts local trajectory planning to avoid obstacles.Phase 2: From Fig. 6B to Fig. 6C, the BUV performs local trajectory planning, determines the starting point and target location of the local trajectory planning and bypasses the obstacle.The obstacle 5 located at the end point of the local trajectory is detected at the moment of Fig. 6C, so the end point of the local trajectory planning is redetermined and the local trajectory

Fig. 6 .Fig. 7 .
Fig. 6.BUV 3-dimensional space real-time trajectory planning results for the case of local end point change.
From 17.3 to 24.5 s, the bionic underwater robot advances on the local trajectory to bypass the fish-shaped obstacle.At 29.2 s, it returns to the global trajectory and reaches the end position at 40 s.It can be seen that the method of this paper still has satisfactory performance in the uwsim simulation environment.Figures 11 to 13 give the outputs of the controller at

Fig. 13 .
Fig. 13.Time evolution of the tracking error and the distance between BUV and 2 obstacles.

Table 5 .
Obstacle information

.
Fuzzy control rules for K p

Table 3 .
Fuzzy control rules for K i

Table 4 .
Fuzzy control rules for K d

Table 6 .
Obstacle information

Table 7 .
Comparison results